Parallel Processing of cluster by Map Reduce

نویسنده

  • Madhavi Vaidya
چکیده

MapReduce is a parallel programming model and an associated implementation introduced by Google. In the programming model, a user specifies the computation by two functions, Map and Reduce. The underlying MapReduce library automatically parallelizes the computation, and handles complicated issues like data distribution, load balancing and fault tolerance. Massive input, spread across many machines, need to parallelize. Moves the data, and provides scheduling, fault tolerance. The original MapReduce implementation by Google, as well as its open-source counterpart, Hadoop, is aimed for parallelizing computing in large clusters of commodity machines. Map Reduce has gained a great popularity as it gracefully and automatically achieves fault tolerance. It automatically handles the gathering of results across the multiple nodes and returns a single result or set. This paper gives an overview of MapReduce programming model and its applications. The author has described here the workflow of MapReduce process. Some important issues, like fault tolerance, are studied in more detail. Even the illustration of working of Map Reduce is given. The data locality issue in heterogeneous environments can noticeably reduce the Map Reduce performance. In this paper, the author has addressed the illustration of data across nodes in a way that each node has a balanced data processing load stored in a parallel manner. Given a data intensive application running on a Hadoop Map Reduce cluster, the auhor has exemplified how data placement is done in Hadoop architecture and the role of Map Reduce in the Hadoop Architecture. The amount of data stored in each node to achieve improved data-processing performance is explained here.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Survey of Parallel Data Processing in Context with MapReduce

MapReduce is a parallel programming model and an associated implementation introduced by Google. In the programming model, a user specifies the computation by two functions, Map and Reduce. The underlying MapReduce library automatically parallelizes the computation, and handles complicated issues like data distribution, load balancing and fault tolerance. The original MapReduce implementation b...

متن کامل

Parallel and Scalabale Rules Based Classifier Using Map-reduce Paradigm on Hadoop Cloud

The huge amount of data being generated by today’s data acquisition and processing technologies. Extracting hidden information is become practically impossible from such huge datasets, even then there are several data mining tasks like classification, association rule, clustering, etc. are used for information extractions. Data mining task, classification, consists of identifying a class to a s...

متن کامل

Parallelizing Wavelet Transformation with a Reduction Style Framework

There is an increasing trend towards parallel processing and data processing using cloud environments. Simple APIs for expressing parallelism, such as map-reduce, are popular in cloud environments, but their expressibility is generally considered limited. This paper focuses on how a signal processing algorithm, wavelet transform, can be parallelized in a cloud environment comprising a cluster o...

متن کامل

Large Scale Metagenomic Sequence Clustering via Sketching and Maximal Quasi-clique Enumeration on Map-Reduce Clusters

Taxonomic clustering of species from millions of DNA fragments sequenced from their genomes is an important and frequently arising problem in metagenomics. High-throughput next generation sequencing is enabling the creation of large metagenomic samples, while at the same time making the clustering problem harder due to the short sequence length supported and sampling of hitherto unknown species...

متن کامل

Analysing Distributed Big Data through Hadoop Map Reduce

This term paper focuses on how the big data is analysed in a distributed environment through Hadoop Map Reduce. Big Data is same as “small data” but bigger in size. Thus, it is approached in different ways. Storage of Big Data requires analysing the characteristics of data. It can be processed by the employment of Hadoop Map Reduce. Map Reduce is a programming model working parallel for large c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012